The Construction of a Tagged Danish Corpus

نویسندگان

  • Thomas Bilgram
  • Britt Keson
چکیده

T h e o b jec t o f th is p ap e r is to p resen t ongo ing w o rk o n th e co n stru c tio n o f a m o rp hosyn tac tica lly tag g ed D an ish co rpus, w h ich is an in tegral s tep in th e m ak in g o f a C o n stra in t G ram m ar (C G ) p a rse r fo r D an ish an d a lso co n stitu te s a p a rt o f th e D an ish co n trib u tio n to th e E u ro p ean P A R O L E pro ject. T h is p ap er d iscu sses v arious aspects o f the m o rp h o lo g ica l d esc rip tio n o f D an ish u se d h ere as w ell as so m e o f th e gu ide lines develo p ed fo r the m an u a l d isam b ig u a tio n p rocess. F ina lly , i t a lso b rie fly g ives an o v erv iew o f th e o b jec tives o f th e tw o p ro jec ts invo lved .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

From Treebank to Propbank: A Semantic-Role and VerbNet Corpus for Danish

This paper presents the first version of a Danish Propbank/VerbNet corpus, annotated at both the morphosyntactic, dependency and semantic levels. Both verbal and nominal predications were tagged with frames consisting of a VerbNet class and semantic role-labeled arguments and satellites. As a second semantic annotation layer, the corpus was tagged with both a noun ontology and NER classes. Draw...

متن کامل

The Construction of a Chinese Named Entity Tagged Corpus: CNEC1.0

In order to build an automatic named entity recognition (NER) system for machine learning, a large tagged corpus is necessary. This paper describes the manual construction of a Chinese named entity tagged corpus (CNEC 1.0) that can be used to improve NER performance. In this project, we define five named entity tags: PER (person name), LOC (location name), ORG (organization name), LAO (location...

متن کامل

Comma checking in Danish

This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect commas in Danish. Trained on a part-of-speech tagged corpus of 600,000 words, the system identifies incorrect commas with a precision of 91% and a recall of 77%. The system was developed by randomly inserting commas in a text, which were tagged as incorrect, while the original commas were tagged...

متن کامل

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998